-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unstack()
support for non-multiindexed dataframes
#7054
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-0.18 #7054 +/- ##
===============================================
+ Coverage 82.09% 82.12% +0.02%
===============================================
Files 97 97
Lines 16477 16487 +10
===============================================
+ Hits 13527 13540 +13
+ Misses 2950 2947 -3
Continue to review full report at Codecov.
|
python/cudf/cudf/core/reshape.py
Outdated
|
||
Unstacking single level index dataframe: | ||
|
||
>>> df.unstack(['b', 'd']).unstack() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this example is a little opaque - it's sometimes difficult to visualize exactly what the result of unstack
should be for even a single level, and here I find it a little hard to connect to dots through the chained operation. I'd recommend an example that starts with a dataframe with a single index and shows the result of unstacking that dataframe into a series instead.
python/cudf/cudf/core/reshape.py
Outdated
"is not supported" | ||
) | ||
if isinstance(df, cudf.DataFrame): | ||
res = df.T.stack(dropna=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this pass the typecasting behavior off to transpose? Should we check the dtypes and possibly error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like both transpose.pyx
and libcudf::transpose.cu
checks whether all columns have the same datatype. A clear exception gets raised if the columns are of different types. Should we check again here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would support checking here - imagining what happens here from the user perspective, if I get an error trying to unstack
a cuDF dataframe, I might wonder why the transpose code is unhappy.
In general, I think we try and avoid letting libcudf itself serve an error to the user and favor a more surface level python error, usually when I've managed to actually manifest a libcudf error from the python API it means something is very wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of questions otherwise LGTM.
python/cudf/cudf/core/reshape.py
Outdated
) | ||
if isinstance(df, cudf.DataFrame): | ||
dtype = df._columns[0].dtype | ||
if any( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't matter much in most cases but I think it's slightly more efficient to use a single loop that raises if df._columns[i].dtype != dtype
as opposed to creating a list with all the booleans and then doing an any
reduction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raised an issue since this comes up in a bunch of places #7067
unstack()
support for non-multiindexed dataframesunstack()
support for non-multiindexed dataframes
rerun tests |
Closes #6694
When
unstack()
receives a dataframe with "single" index, returns a series to match pandas behavior.